Search CORE

9 research outputs found

Detection and handling of overlapping speech for speaker diarization

Author: Zelenák Martin
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

For the last several years, speaker diarization has been attracting substantial research attention as one of the spoken language technologies applied for the improvement, or enrichment, of recording transcriptions. Recordings of meetings, compared to other domains, exhibit an increased complexity due to the spontaneity of speech, reverberation effects, and also due to the presence of overlapping speech. Overlapping speech refers to situations when two or more speakers are speaking simultaneously. In meeting data, a substantial portion of errors of the conventional speaker diarization systems can be ascribed to speaker overlaps, since usually only one speaker label is assigned per segment. Furthermore, simultaneous speech included in training data can eventually lead to corrupt single-speaker models and thus to a worse segmentation. This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlationbased parameters for overlap detection on distant microphone channel data. Spatial features from different microphone pairs are fused by means of principal component analysis, linear discriminant analysis, or by a multi-layer perceptron. In addition, we also investigate the possibility of employing longterm prosodic information. The most suitable subset from a set of candidate prosodic features is determined in two steps. Firstly, a ranking according to mRMR criterion is obtained, and then, a standard hill-climbing wrapper approach is applied in order to determine the optimal number of features. The novel spatial as well as prosodic parameters are used in combination with spectral-based features suggested previously in the literature. In experiments conducted on AMI meeting data, we show that the newly proposed features do contribute to the detection of overlapping speech, especially on data originating from a single recording site. In speaker diarization, for segments including detected speaker overlap, a second speaker label is picked, and such segments are also discarded from the model training. The proposed overlap labeling technique is integrated in Viterbi decoding, a part of the diarization algorithm. During the system development it was discovered that it is favorable to do an independent optimization of overlap exclusion and labeling with respect to the overlap detection system. We report improvements over the baseline diarization system on both single- and multi-site AMI data. Preliminary experiments with NIST RT data show DER improvement on the RT ¿09 meeting recordings as well. The addition of beamforming and TDOA feature stream into the baseline diarization system, which was aimed at improving the clustering process, results in a bit higher effectiveness of the overlap labeling algorithm. A more detailed analysis on the overlap exclusion behavior reveals big improvement contrasts between individual meeting recordings as well as between various settings of the overlap detection operation point. However, a high performance variability across different recordings is also typical of the baseline diarization system, without any overlap handling

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Speaker diarization of broadcast news in Albayzin 2010 evaluation campaign

Author: A Hatch
A Tritschler
C Neves
C Vaquero
D Moraru
DA Reynolds
F Castaldo
H Hermansky
Henrik Schulz
J Ajmera
J Gauvain
J Haitsma
J Pelecanos
Javier Hernando
L Rabiner
L Wilcox
M Aguiló
M Diez
M Zelenák
MA Siegler
Martin Zelenák
MW Wheeler
N Dehak
N Dehak
N Mirghafori
R Auckenthaler
R Kuhn
RO Duda
S Chen
S Galliano
S Meignier
S Meignier
T Butko
X Anguera
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Author: A Abad
A Cardenal-Lopez
A Cardenal-López
A Jansen
A Jansen
A Martin
A Moreno
A Moreno
A Moreno-Sandoval
A Stolcke
Alejandro Coucheiro-Limeres
AM Azmi
Antonio Cardenal
Antonio Miguel
B Logan
B Logan
B Ma
B Taras
B Zhang
C Ni
C Parada
Carmen Garcia-Mateo
CJ Chen
D Can
D Karakos
D Povey
D Vergyri
D Vergyri
Doroteo T. Toledano
F Metze
F Metze
GJF Jones
H Joho
H Joho
H Su
H-Y Lee
H-Y Lee
HVD Heuvel
I Szöke
I Szöke
I-F Chen
I-F Chen
J Chiu
J Chiu
J Chiu
J Garofolo
J Li
J Mamou
J Mamou
J Pinto
J Tejedor
J Tejedor
J Trmal
J van Hout
Javier Tejedor
JG Fiscus
Julia Olcoz
Julian David Echeverry-Correa
K Iwata
K Thambiratmann
KM Knill
KM Knill
L Docío-Fernández
L Mangu
Laura Docio-Fernandez
LJ Rodríguez-Fuentes
M Bisani
M Cai
M Ma
M Saraclar
M Wollmer
M Zelenák
MJF Gales
MS Seigel
N Rajput
NF Chen
NF Chen
P Yu
Paula Lopez-Otero
R Justo
S Nakagawa
SP Rath
T Ng
T Ohno
T Sakai
V Mitra
V-B Le
X Anguera
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Repositorio Universidad de Zaragoza

Biblos-e Archivo

Distributed RF sensing framework with radio environment emulation

Author: Duplicy Jonathan
Gameiro Atílio
Quaresma José
Ribeiro Carlos
Zelenák Martin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

This paper introduces a reconfigurable test bed for advanced distributed RF sensing algorithmic prototyping and performance evaluation that demonstrates the feasibility of the QoSMOS project sensing architecture. Both the individual sensors and the controller unit are implemented using the open SDR platform GNU Radio. The sensors use USRP hardware. The current implementation uses the energy detection algorithm in the sensors and the hard 1-out-of-M combination in the controller unit to generate the final decision on the presence of an incumbent signal. All communication between the controller unit and the sensing elements is done over standard IP protocol for ease of use. The test signals imitating the RF scene that the sensors would experience in a real world deployment are created using an Agilent's state-of-the-art radio environment emulator. The significant gains achieved with this distributed sensing test bed are aligned with the theoretical results found in the literature

Repositório Institucional da Universidade de Aveiro

Palladium-Catalysed Cross-Coupling Reactions Controlled by Noncovalent Zn···N Interactions

Author: Abraham
Ackermann
Ackermann
Adler
Alonso
Amatore
Anselmo
Arockiam
Beletskaya
Beletskaya
Beller
Beller
Bellina
Besset
Blanco
Blaser
Bocokic
Brase
Brown
Busto
Bélanger
Cabri
Campeau
Carboni
Catti
Cavallo
Chirik
Chu
Cooper
Crabtree
Crabtree
Darensbourg
de Meijere
de Meijere
De Santis
Desiraju
Desiraju
Durot
Dydio
Dydio
Dzik
Eicher
Elemans
Escudero-Adán
Escudero-Adán
Escudero-Adán
Escudero-Adán
Farina
Fleury-Brégeot
Glorius
Gramage-Doria
Gramage-Doria
Grützmacher
Hartwig
Hassel
Hindson
Hoshino
Hunter
Imamura
Irandoust
Jones
Joule
Jutand
Kadish
Kadish
Kadish
Kaim
Kamer
Kanyiva
Karmakar
Kirksey
Kleij
Kleij
Kuil
Kuwano
Kuwata
Leclerc
Leenders
Lenoir
Lifschitz
Lipiner
Littke
Liu
Luca
Lyaskovskyy
Lüning
Mallet
Martin
Meeuwissen
Metrangolo
Miller
Miyaura
Morisue
Nishio
Nishio
Noyori
Phan
Pozgan
Raynal
Rothenberg
Samanta
Sandee
Sanderson
Schneider
Schröder
Sheldon
Slagt
Summers
Suslick
Suzuki
Szintay
Thirunavukkarasu
Thordarson
Toma
Troff
Tsuzuki
Ulatowski
van der Vlugt
van Leeuwen
van Leeuwen
Vasilyev
Vogel
Vogel
Wang
Wojaczynksi
Yorimitsu
Zarra
Zelenák
Zhao
Zhou
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

International audienceNon-covalent interactions between halopyridine substrates and catalytically inert building blocks, namely zinc(II)-porphyrins and zinc(II)-salphens, influence the catalytic outcome of Suzuki-Miyaura and Mizoroki-Heck palladium-catalysed cross-coupling reactions. The weak Zn⋅⋅⋅N interactions between halopyridine substrates and zinc(II)-containing porphyrins and salphens, respectively, were studied by a combination of H NMR spectroscopy, UV/Vis studies, Job-Plot analysis and, in some cases, X-ray diffraction studies. Additionally, the former studies revealed unique supramolecular polymeric and dimeric rearrangements in the solid state featuring weak Br⋅⋅⋅N (halogen bonding), C-H⋅⋅⋅π, Br⋅⋅⋅π and π⋅⋅⋅π interactions. The reactivity of halopyridine substrates in homogeneous palladium-catalysed cross-coupling reactions was found to correlate with the binding strength between the zinc(II)-containing scaffolds and the corresponding halopyridine. Such observation is explained by the unfavourable formation of inactive over-coordinated halopyridine⋅⋅⋅palladium species. The presented approach is particularly appealing for those cases in which substrates and/or products deactivate (or partially poison) a transition-metal catalyst

Crossref

HAL-Rennes 1